Ensemble classifier for protein fold pattern recognition

نویسندگان

  • Hong-Bin Shen
  • Kuo-Chen Chou
چکیده

MOTIVATION Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. RESULTS The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. AVAILABILITY The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifier Ensemble Framework: a Diversity Based Approach

Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

Margin-based ensemble classifier for protein fold recognition

Recognition of protein folding patterns is an important step in protein structure and function predictions. Traditional sequence similarity-based approach fails to yield convincing predictions when proteins have low sequence identities, while the taxonometric approach is a reliable alternative. From a pattern recognition perspective, protein fold recognition involves a large number of classes w...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

Enhancing Protein Fold Prediction Accuracy Using an Ensemble of Different Classifiers

Protein fold prediction problem is considered as a key point to protein structure recognition and structural discoveries. Recent advances in pattern recognition field brought a great interest to apply pattern classification techniques to tackle this problem. From the pattern recognition point of view, the protein fold prediction problem can be expressed as a multi-class classification task that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 22 14  شماره 

صفحات  -

تاریخ انتشار 2006